Skip to content

Conversation

@liangel-02
Copy link
Contributor

@liangel-02 liangel-02 commented Nov 7, 2025

Summary
This PR adds variable length attention (varlen) support to the Llama 3 8b model in torchtitan. We replace use_flex_attn with attn_type (either "sdpa", "varlen", "flex"). If attn_type = "varlen", the attention module calls a compiled varlen_attn defined here.

Testing
Ran loss and performance tests against flex attention. Loss is on par.

Screenshot 2025-11-19 at 3 24 26 PM

Varlen is slightly slower than Flex due to the cuda kernel speeds (varlen calls into flash_attention_forward/flash_attention_backward today).

Varlen Flex
Forward 774us 357ns 722us 317ns
Backward 1ms 955us 916ns 1ms 558us 747ns

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 7, 2025
@liangel-02 liangel-02 force-pushed the test_varlen branch 3 times, most recently from eeecb63 to cad97e5 Compare November 12, 2025 22:49
@liangel-02 liangel-02 changed the title Test varlen adding variable length attention to llama 3 8b Nov 12, 2025
@liangel-02 liangel-02 changed the title adding variable length attention to llama 3 8b adding variable length attention to llama3 8b Nov 12, 2025
@liangel-02 liangel-02 requested a review from drisspg November 12, 2025 23:18
Copy link
Contributor

@fegin fegin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation won't work with PP and too model intrusive. The pack logic should be hide inside the inner attention.

@liangel-02 liangel-02 force-pushed the test_varlen branch 4 times, most recently from 55352a5 to 066ca02 Compare November 14, 2025 18:11
@liangel-02 liangel-02 requested a review from fegin November 14, 2025 18:11
@liangel-02 liangel-02 marked this pull request as ready for review November 14, 2025 18:14
Copy link
Contributor

@fegin fegin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the update. Leave some other comments, after the comments are addressed, this PR should be ready.

@liangel-02 liangel-02 force-pushed the test_varlen branch 2 times, most recently from a902cbe to de416f9 Compare November 17, 2025 18:05
@liangel-02 liangel-02 requested a review from fegin November 17, 2025 18:05
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left some comments, please see if they make sense to you.

@liangel-02 liangel-02 force-pushed the test_varlen branch 4 times, most recently from caafc81 to 4d36560 Compare November 18, 2025 21:49
@liangel-02 liangel-02 force-pushed the test_varlen branch 4 times, most recently from 9380847 to 42c0c85 Compare November 19, 2025 22:33
@liangel-02 liangel-02 requested review from fegin and tianyu-l November 19, 2025 22:34
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some more comments. If you'd like to focus on Llama 3 in this PR, that's fine with me too.

@liangel-02 liangel-02 force-pushed the test_varlen branch 4 times, most recently from 5528029 to 31c1c77 Compare November 20, 2025 17:35
Copy link
Contributor

@fegin fegin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we can leave other models to other PR(s).

@liangel-02 liangel-02 force-pushed the test_varlen branch 4 times, most recently from b717da3 to 9c99fcb Compare November 20, 2025 19:11
@liangel-02 liangel-02 requested a review from tianyu-l November 20, 2025 19:46
xv,
self.head_dim,
attention_masks,
is_causal=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail? I think is_causal is no longer accepted.

Btw, it seems varlen is not tested in CI, can we add one test similar to https://github.com/pytorch/torchtitan/blob/main/tests/integration_tests/features.py#L336

@liangel-02 liangel-02 force-pushed the test_varlen branch 2 times, most recently from 1af38e5 to df22636 Compare November 21, 2025 16:45
@liangel-02 liangel-02 requested a review from tianyu-l November 21, 2025 18:03
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

We need to modify save_list of SAC to save the result of varlen attn, to be consistent with other attn implementations. Can do this in next PR.

[
[
"--parallelism.data_parallel_shard_degree=4",
"--activation_checkpoint.mode='full'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use per_op_sac like the test above.

@tianyu-l tianyu-l merged commit f8fa21e into main Nov 21, 2025
10 of 12 checks passed
@tianyu-l tianyu-l deleted the test_varlen branch November 21, 2025 22:46
kiansierra added a commit to kiansierra/torchtitan-modal that referenced this pull request Nov 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants